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Abstract 

We develop a multivariate cure survival model to estimate lifetime patterns of colorectal cancer screening. 
Screening data cover long periods of time, with sparse observations for each person. Some events may occur be¬ 
fore the study begins or after the study ends, so the data are both left- and right-censored, and some individuals 
are never screened (the “cured” population). We propose a multivariate parametric cure model that can be used 
with left- and right-censored data. Our model allows for the estimation of the time to screening as well as the 
average number of times individuals will be screened. We calculate likelihood functions based on the observa¬ 
tions for each subject using a distribution that accounts for within-subject correlation, and estimate parameters 
using Markov Chain Monte Carlo methods. We apply our methods to the estimation of lifetime colorectal cancer 
screening behavior in the SEER-Medicare data set. 
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1 Introduction 

Colorectal cancer (CRC) is the third most common cancer in the United States, both in incidence and mortality 
rates [1]. Because this cancer is largely asymptomatic, it is important for individuals to be screened regularly. Not 
only can screening detect colorectal cancer at earlier stages, but it can also detect pre-cancerous polyps, which can 
be removed [1]. The effectiveness of CRC screenings led the United States Preventive Task Force to set screening 
guidelines in 1996 [30]. In this work, we focus on colonoscopy screenings, which are currently recommended to be 
performed once every ten years for average risk individuals starting at age 50. While more expensive and risky, a 
colonoscopy is the most thorough form of screening as it can examine the entire colon with few false negatives or 
false positives, and can remove polyps and even some cancers during an examination [39]. Little is known about 
lifetime colonoscopy screening behavior, as it is challenging to estimate due to incomplete data. Many individuals 
are not observed for the entirety of their eligibility, possibly censoring observations that occur before or after the 
study period. Additionally, individuals can have zero to many screenings in a lifetime, and can have a screening 
immediately upon becoming due or can delay varying lengths of time. 

Previous studies have examined rates of colorectal cancer screening in different populations, including the Medicare 
population (such as those found in [39, 32, 33, 14]). However, while these studies report that screening and adherence 
rates are low, the methods used do not account for screening behavior that may have occurred before or after the 
study observation period. In addition, these studies do not quantify the average amount of time individuals wait 

*Yolanda Hagar is a postdoctoral researcher in Applied Mathematics, University of Colorado at Boulder. Danielle Harvey and 
Laurel Beckett are faculty in the Department of Public Health Sciences, University of California, Davis. Correspondence emails: 
yolanda.hagarOcolorado.edu 



between screenings, or how many screenings individuals receive in a lifetime. Because of this lack of information, 
it has been difficult for researchers to confirm optimal screening guidelines. While screening for colorectal cancer 
reduces cancer risk [1, 30, 39], the colonoscopy procedure itself can be risky, requires a trained specialist [24, 6], and 
unnecessary screenings put an avoidable financial burden on the Medicare system (for example, see [37]). Without 
knowledge of lifetime screening patterns, it has been difficult to perform long-term cost-benefit analyses for outcomes 
in colorectal cancer. Given the importance of determining lifetime colonoscopy screening behavior, we develop a 
multivariate survival model that allows for a proportion of subjects to never be screened, and we use our model to 
estimate patterns in lifetime colonoscopy screening behavior. 


2 SEER-Medicare Data 

We used the SEER-Medicare data set to quantify lifetime colonoscopy screening behavior. This large, public data set 
is a linkage between the Surveillance, Epidemiology and End Results (SEER) program of cancer registries and Medi¬ 
care claims files, and is one of the largest and most complete data sets containing colonoscopy screening information 
[29]. However, subjects in the SEER-Medicare data set are age 65 or older, and were only observed between 1991 
and 2003. Possible screenings occurred before age 65, before 1991, or after 2003 and were not observed, so lifetime 
behavior was left- and/or right-censored for many individuals. Additionally, some individuals were never screened 
through colonoscopy, while some individuals were screened more than once in a lifetime. Examples of the complexities 
of screening behavior can be observed in Figure 1. Note that although the actual colonoscopy screening behaviors of 
hypothetical subjects A and B are different, the observed trajectories are the same. Similarly, hypothetical subjects 
C and D have identical observed screening patterns but different true lifetime behaviors. Statistical methods that 
can account for screening patterns that occur outside the observation window are necessary for proper estimation of 
lifetime screening behavior. 

In addition to estimation of the time and rates of colonoscopy screenings, we are also interested in the impact of 
health policy changes on screening behavior. Medicare changed insurance coverage policy rules in 1998 to provide 
increased coverage for colonoscopy screenings. Before 1998, no colonoscopy screenings were covered by Medicare 
(“phase 0”). Between 1998 and June 30, 2001 (“phase 1”), colonoscopy screenings were covered for high-risk individ¬ 
uals (e.g. those with family history of colorectal cancer), and starting July 1, 2001 (“phase 2”), coverage was provided 
to all Medicare patients, regardless of risk level. Previous work has shown that an increase in Medicare colonoscopy 
coverage led to an increase in screenings [14] , however this has not been examined in the context of lifetime screening 
patterns or in the quantification of the time to being screened. Understanding the impact of changes in guidelines is 
important to understanding barriers and patterns of screening behaviors. 

The model we propose will answer a number of open questions; we will be able to quantify the number of times 
individuals are screened in a lifetime, and the length of time individuals wait between screenings, while quantifying 
how insurance policy changes affect screening behaviors. To do this, we develop and implement a multivariate 
cure model that accounts for both left- and right-censoring, within-subject correlation, the estimation of multiple 
event times, and the average number of events per person. This model is particularly well-suited for the estimation of 
lifetime colorectal cancer screening, as these events are sparse over the course of an individual’s lifetime and cover long 
periods of time. The resulting estimates are robust, despite the left-censoring, right-censoring and number of subjects 
who are never screened. The rest of this article is organized as follows: In Section 3, we discuss the multivariate 
survival methodology we have developed for left- and right-censored data, including the MCMC algorithm used for 
parameter estimation. We also discuss how covariates are incorporated into the model through the parameters. In 
Section 4, we present results of a simulation to validate our approach for settings similar to the SEER-Medicare data. 
In Section 5, we use our methodology to estimate colonoscopy screening behavior in the Medicare population, using 
the SEER-Medicare data set, assuming individuals can have up to two screenings in a lifetime. Section 6 contains a 
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Figure 1: Four hypothetical lifetime colonoscopy screening trajectories, and the observed information (unshaded region) available from 
the SEER-Medicare data set. In the figure, a circle denotes the time an individual becomes due for a screening (occurs either at age 
50 or 10 years after the previous screening), and an ‘X’ denotes the time of a colonoscopy screening. The solid lines represent periods 
of time where the subject is overdue for a colonoscopy (called a “lag time”), and the dashed lines represent the ten year period during 
which average-risk subjects are not due for a screening. Among the hypothetical trajectories, subject A is screened twice, once before 
1991, and then again in the observation window. Subject B has only one screening, which is observed. However, based on information 
provided in the observation window, it is not possible to tell if the lifetime trajectory for subject A is different than it is for subject B. 
Similarly, neither subject C or D has an observed screening, but subject C does get a colonoscopy screening after the study ends, while 
subject D does not. In this example, all four hypothetical subjects are both left- and right-censored. 
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discussion and concluding remarks. 


3 Model 

Many methods have been used to examine colonoscopy screening behavior (such as those in [39, 32, 33, 14]). These 
studies examined screening rates using simple approaches, such as counting the number of colonoscopy screenings that 
occur each year, or more sophisticated approaches using Poisson regression [14]. While these statistical approaches 
provide basic information on colonoscopy screening trends, to quantify lifetime screening behavior, a model is needed 
that can account for both the left- and right-censoring inherent in the SEER-Medicare data set, as well as allow 
for some people to never be screened. Below, we formulate a multivariate survival model that determines how long 
people wait between screenings, as well as the average number of screenings individuals get in a lifetime. 

3.1 Background 

Cure models allow for the estimation of time to an event when a subset of the population is risk-free and will never 
experience the event. In the estimation of lifetime screening behavior, the event of interest is a colonoscopy screening, 
and those individuals who will never be screened are part of the population that will never experience the event, 
and hence are “cured” in traditional model terminology. The time to event is calculated as the time an average-risk 
individual waits between becoming due for a screening (occurs at age 50 or 10 years after the previous colonoscopy 
screening) and actually getting screened, which we refer to as the “lag time”. (The lag time is depicted by solid lines 
in Figure 1.) 

Cure models have been examined at length. Initial work by Boag [3] presents the mixture model S'pop(t;t/’) = 
TT -|- (1 — TT)S{t;'ip), where 5'(t;'0) represents the survival function for individuals who will experience the event, and 
liuit-^oo Spop{t] il^) = TT. This model has been studied extensively, and can be seen in [2, 15, 16, 7, 13, 27, 35] and 
others, but can be complex computationally and is difficult to extend to the multivariate case. 

In addition to the possibility that some individuals may never be screened, some individuals may receive many 
screenings in a lifetime, so the possibility of multiple, dependent events (i.e. colonoscopy screenings) must be 
accounted for. To this end, there are many existing multivariate survival models that have been studied. A large 
body of work has been devoted to using Cox models and a marginal hazards approach to investigate the effects 
of covariates on the hazard rate(s), such as that by Wei, et. al. [38], Liang, et. al [25], Lin [26], Prentice and 
Hsu [31], Spiekerman and Lin [34] and others. These models obtain population-averaged covariate effects, but are 
mainly attractive when the correlation between observations is not of interest. Hougaard has done much work with 
a frailty term in multivariate survival and competing-risks models [20, 18, 17, 19, 21, 22], however, these models do 
not include the possibility of a cured population. Extensive work on multivariate survival models has been done 
by Chen, Ibrahim, and Sinha [5, 23], and perhaps matches our work most closely as some of the models allow for 
a cured population for right-censored data. In this work, the authors integrate over latent variables representing 
the number of risks for each subject, as well as a frailty parameter (to account for within-subject correlation) to 
get a likelihood function that can accommodate multiple events as well as a proportion of subjects who are cured. 
However, in addition to right-censoring, we are also interested in the case of left-censoring, and require a model that 
incorporates this type of missingness. 

We introduce a type of multivariate cure model that allows for the estimation of multiple lag times for each 
individual and the probability that an individual will receive zero through many lifetime screenings. Correlation 
between lag times is accounted for with a frailty term. 
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3.2 Notation 


Assume an individual has M screenings in his or her lifetime, where M is a random variable such that 0 < M < i < oo, 
with i denoting the maximum number of lifetime screenings possible for any individual. (The time frame for colorectal 
cancer screening is finite, so the assumption .^ < oo is natural.) The probability an individual will have m lifetime 
screenings, i.e. P(M = m), is equal to 9m,m = 1, ■■■,(., and the probability an individual will never be screened is 
equal to 00 = 1 ~ ^ 3 - If individuals are left-censored (i.e. they enter the study after eligibility begins) and/or 

right-censored (i.e. they leave the study before eligibility ends), we may only observe k of the Mi = m screenings, 
where k < m < i, possibly obscuring part of the lifetime screening pattern. We estimate the lag times and probability 
of individuals receiving M [M = 0,..., £) lifetime screenings via the likelihood function and the multivariate survival 
distribution developed by Hougaard [18]: 

For individuals who receive at least one screening, let Yij represent the lag time for the subject (j = 
1,..., Mi, i = 1,..., n). The lag times for each individual are correlated through a subject-specific frailty quantity 
but within each subject the lag times are independent. Let Zi be the subject-specific quantity for the subject. 
Assume the Z^’s are independent and follow a positive stable distribution with parameter a. The distribution is 
given by the Laplace transform 


E {exp {—sZ)} = exp (—s“), 

with a G (0,1]. The case of a = 1 represents the case of independent observations within each subject. For the 
subject, conditional on M and z, the joint distribution for the lag times is then 

PiYi > yii,...,YM > ViM \M,Z = z) =exp{-2:(Ai(yii) H- \- J^MiViM:))} , 

where Aj(.) is the cumulative hazard of the lag time. For all individuals, the multivariate survival distribution, 
given M, then becomes 


P{Yi > yi,...,yM > yM\M) = / exp{-z(Ai(yi) H- V KM{yM))} p{z)dz 

J Z 

= exp{-(Ai(yi) H-hAM(yM))“} (1) 

= S{yi,...,yM I M) 

Using this multivariate survival function, we can calculate the probability of a colonoscopy screening occurring 
before, at, or after certain time points. For example, the probability of a colonoscopy screening occurring at time yi 
is calculated as 

P{Yi = 2/1:12 > 2/2, • ■ • > 2/m I Af) = -^S{yi ,... ,yM | M), (2) 

oy\ 

and the probability of observing all M screenings can be calculated as 

nM 

P{yi = yi,y2 = 2/2,... ,bM = 2/M I A/) = - j^S{yi, ... ,z/m | AT) (3) 

oyi... oyM 

These calculations are akin to finding the cdf f{y) = —d/dyS{y) in the univariate case. In our notation, we use 
P{Y = y) to represent f{y). While this definition does not mathematically exist for continuous functions, in the 
multivariate case it is a notational method for expressing an observed colonoscopy screening at time y. 


5 



3.3 Likelihood Function 

Because the observed screening colonoscopies are sparse, we use Markov Chain Monte Carlo (MCMC) sampling 
for parameter estimation. The posterior distribution for each parameter is calculated using the likelihood function, 
which is formulated below. 

Denote the observed data for individual i by Wi = (tLi, ..., tfej, where t^i and tm represent the left- and 

right-censoring times (with tLi = 0 if an individual is not left-censored, and tm equal to the end of the observation 
period if the individual is not right-censored), and denoting the ki observed screening times, with 0 < 

ki < rrii < The case of ki equal to zero denotes that no colonoscopy screenings were observed in the study period. 
Using the multivariate survival distribution in equation (1), we can then write a complete data likelihood function 
as follows: 


n l / \ 

L(W, rf\4>,9,a)=l[l[i P{M = j)p,, (W, \$j,a,M = j) 

2^1 \ / 

n i / \ 

= n n ( \ = j)\ , (4) 

where (j)j denotes the parameter vector for the survival distribution for M = j lag times, and r]ij = Ii{M = j) is 
an indicator variable that equals one if subject i gets j colonoscopies in a lifetime (with rjiQ = 1 — Vij)- The 

Pij{Wi I (j)j,a,M = j) are probabilities associated with the person having j lifetime screenings, and are calculated 
using the multivariate survival function in equation (1) based on the screening pattern observed for the individual. 
An example of the calculation of these probabilities is shown in Sections 3.4 and 3.5. For the case of no observed 
screenings (i.e. M = 0), the probability of zero screenings, Pio{.), is not defined and the only likelihood contribution 
in this instance is 9q. 

For subjects who are left- and/or right-censored and who do not have £ observed screenings, some or all of 
fji = (rjiQ ,..., rjii) may be unobserved. For the unobserved rjij^ the indicators are replaced with their expected values, 
which are calculated at each iteration of the MCMC routine using the current sampled parameter values. An example 
of this calculation is shown in Sections 3.4 and 3.5. 


3.4 Example: Univariate likelihood 

To illustrate our model in its simplest form, we first cover the univariate case oi £ = 1 (i.e. individuals can only get 
one colonoscopy in a lifetime). In the univariate instance, the likelihood function in (4) can be simplified to: 

n 1 / \ '^'3 

L(W, r? I 0) = n n I M = j) 

\ / 

n 

= {9pa{m\$i,M=l)y . ( 5 ) 

i=l 

In the univariate model, subject i who has an observed screening at time Ui contributes the probability 0 x pii{Wi \ 
= 1) = 9x f(tii I cj)i) to the likelihood function (as r]i is known and equal to 1). Conversely, subjects who have 
no observed screenings and who are not left- or right-censored contribute (1 — 6) to the likelihood function, with a 
known r]i = 0. However, a left-censored subject with no observed screening, who enters the study at time tiL, does 
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not have complete information, and it is not known whether a screening occurred before time or did not occur 
at all. On this occasion, Pii{Wi | = 1) = F{tiL | f/'i), accounting for a possible screening before study entry. 

The value of rji is unknown and is updated at each iteration of the Gibbs sampler with its expected value 


This expectation is a number between 0 and 1, and therefore a weight is assigned to the probability subject i was 
screened once or never screened based on when subject i entered the study. Similar calculations can be made for right- 
censored or left- and right-censored subjects. Note that in the univariate likelihood, the within-subject correlation 
parameter a is not necessary, as each subject only has one screening. 

The univariate likelihood is similar to the complete data likelihood presented in Sy and Taylor [36] for the 
univariate case of right-censored subjects. This formulation of the cure model does not reduce to the standard 
cure model as t —> oo, however this definition allows for more flexibility in the estimation of left-censored screening 
behavior. 


3.5 Example Subject 

We now extend the univariate example to the multivariate case of £ = 2. To prevent identifiability issues, and 
because the observed screenings are sparse, the application of the multivariate model typically requires the following 
assumption: 

• A maximum recommended screening age exists. Since colonoscopy is not risk-free and benefits are long-term 
(due to the slow progression of colorectal cancer) [39] , colonoscopy screening is generally not recommended for 
average-risk older patients [30] . 

• The maximum number of lifetime screening colonoscopies, i is fixed. In this example, we set £ = 2. (This is 
the maximum number observed in the SEER-Medicare data set for average-risk patients). 

Using these assumptions, we examine the following subject: An individual i is left-censored at time is observed 

until the maximum screening age (i.e. is not right-censored), and has one screening observed at time tn. Because 
the individual is left-censored and less than two screenings were observed, there are two possible true trajectories for 
this subject (see Figure 2): 

1. The one observed screening is the only lifetime screening. 

2. A screening also occurred before observation of the individual began, and so the subject was screened twice in 
his or her lifetime. 


Because there are two possible trajectories, this subject provides a weighted contribution to both the probability that 
only one lifetime screening occurred and also to the probability that two lifetime screenings occurred to the likelihood 
function. The probabilities for both cases are calculated using the distribution in equation (1). The probability for 
case 1 (represented by trajectory 1 in Figure 2) is calculated under the assumption that the observed screening is 
the only screening and is written as: 


Paim \$1,M=1)= P{Yi = Ui I $i,M = 1) 

= -^5(yi |^i,M = l) 

oyi yi=tii 


( 6 ) 


7 





Figure 2: The figure shows two possible trajectories for a left-censored subject who enters the the observation period at time and 
has one observed screening at time ti. Lag times are denoted with solid lines, and represent periods of time when the subject is due for 
a screening, and the ten-year post-screening period is denoted with a dashed line. Screenings are marked with ‘X’, and time points when 
the subject becomes due for a screening are marked with a circle. In this example, it is possible that (1) the one observed screening is 
the only lifetime screening, or (2) the subject may have had two lifetime screenings, the first one occurring before the left-censoring time 
II, and the second being observed. In the first case, the length of the first lag time, yi, is equal to the time to the first screening, ti, 
and can be written as P(yi = fi | 0i, M = 1). In the second case, we can only determine that the time to the first screening, yi, is less 
than the left-censoring time, and the second lag time, y 2 is the remaining period of time between the observed screening and the previous 
screening (ti — 10 — yi). This can be written as P{Yi < tL, Y 2 = ti — 10 — yi \ 4)2, a, M = 2). This subject is not right-censored because 
he or she reaches the maximum screening age before the observation period ends. 










The probability for the second case (represented by trajectory 2 in Figure 2) is calculated under the assumption 
that one screening occurred before the left-censoring time, and the second screening is observed (i.e. M = 2), and is 
written as: 


Pi2{Wi I (j)2,a,M = 2) = P(Yi < tiL,Y2 = tn - 10 - yi | (j)2,a,M = 2) 

pUl 




1 , 2/2 I 4)2,0L,M = 2) 




dyi- 


(7) 


Note that we do not actually observe the length of yi (the length of the first lag time), we only know that it is less 
than the left-censoring time tiL- Similarly, we do not observe y 2 , the length of the second lag time, and only know 
that 1/2 is equal to the length of time between the first screening at time yi and the second screening at tn, minus 
the 10 -year waiting period. 

Because only one screening is observed,the values of rjn and rja are not known (although it is known that 
Vio = 0 ), and are therefore iteratively estimated, replacing the unknown values with expected values using the 
sampled parameter values at each MCMC chain iteration. In this example, the expected value at the MCMC 
iteration is calculated as: 

^\M=l) + 9i^'>p,2m I ^2\a(-),M = 2) 

^\m =l) + e^^^p,2m I ^^’'\aM,M = 2)’ 






„(’■) = ^ 
9i2 9i2 (r) /tJC I y( 

6*1 p^l{W^ \ (j)\ 


where Pii{.) and pai-) are calculated using equations ( 6 ) and (7). 


3.6 Parameter Estimation 

Because the likelihood is high-dimensional and the observed screening colonoscopies are sparse, we use Gibbs sampler 
[ 10 ] to estimate the posterior distributions of each parameter, iterating through the following steps: 

1. Draw 9 from the conditional posterior distribution 


n(0 I 7, »7) « n n r ^ ((T'O: • ■ • ^ = Eir 


'70, 


i=l j=Q 



where Dir(.) denotes the Dirichlet distribution [ 8 ]. 

2. Sample each 7 j,j = 0^... ,i, the parameters of the Dirichlet prior for 0, assuming an exponential hyperprior 
with parameter Sj, from the conditional posterior distribution: 


n(7j I ot 


7fc) -1 

r(7,) 


exp{-Sj7j}, 


where 7 j- denotes the vector or 7 parameters without the element. 
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3. Sample each element in (j)j = ... ,(pjq), for all If 4>jk ^ gjki-', i?jk),k = the 

conditional distribution is as follows: 

n 

Il{(j}jk \W,T],Kj) (xY\_PijiWi \(j)j,a,M=j)'^'^gjki(l3jk]Kjk), 

i=l 

where Kjk is the parameter vector for the prior distribution of (f>jk- 

In an exploratory examination of colonoscopy screening patterns observed in the SEER-Medicare data set, the 
hazard rate of the lag times is very flat (see Eigure 3), so an exponential distribution for the lag times was 
used in our analysis of lifetime colonoscopy screening patterns (i.e. f(t) = Aexp{—At}). In the exponential 
case, (/)j = (Aji,... as each of the j screenings has one associated parameter. The cumulative hazard 

Ajfc(t) = Xjkt. Let Xjk ~ Gamma{Kjki, Kjk 2 ), with the following posterior distribution: 

n 

n(Ajfc I W,r],Kjk) oc]^pij(ITi I Xjk,a,M = exp{-Xj/Kjk2}, 

i=l 


for fc = I,..., J and j = 

4. Sample each element of Kjk, the prior parameters for (jjjk from the conditional posterior distributions. In the 
exponential case, Kjk = {Kjki, Kjk 2 ), with Kjki ~ Expibjk) and Kjk 2 ~ IG{cjk,djk), where IG{.) is the Inverse 
Gamma distribution. In this instance, the conditional posterior distributions become: 

^{Kjki I Xjk,Kjk2,bjk) oc ^ exp{-bjkKjki} 

I ^jki C-jk-j djk') OC -\- Cjk^ ^jk 


5. Sample the correlation parameter, a, from the conditional posterior 

n 

n(a I W,T],(f),f) oc p (p^jiw^ I cfj,a,M = j))"*' x 

i=l 

where a ~ Beta(Ti,T 2 ). 

6 . Sample the prior parameters for a, ri and r 2 , with both parameters distributed Exp{l), such that the condi¬ 
tional posterior distributions are: 

n(Ti I a,T 2 ) oc h^-i^a'"i"^exp{-Ti} 

r('ri) 

n(T2 I a, Ti) oc exp{-T2} 

i G2) 

7. Update each unobserved ? 7 y,* = 1,... ,n,j = !,...,£ with the expectation, using the Gibbs sampler estimates 
of the parameters at that iteration. 
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3.7 Covariates 


Following previous work done by Ghitany and Mailer [12] and others [12, 23, 5], covariates are added to the model 
by incorporating them into the parameter(s) of interest. The probability an individual i has j lifetime colonoscopy 
screenings can be modeled as 6ij = expit(X'/3j), where is a 1 x covariate vector for the subject, and 
/3i = (/3i, • ■ •,/?pj) are the effects of the covariates on the probability of j lifetime colonoscopy screenings. The 
expit(.) function, defined as 


expit (a) 


exp{a} 

1 + exp{a}’ 


is used to ensure that the resulting estimates will be between 0 and 1. 

The elements of the parameter vector (j)j can be modeled in a similar fashion using an appropriate link function. 
If (j)j only has one element, then = h~^ , where Zi is the covariate vector for the subject, ojj are the 

effects of the covariates on the lag time, and h{-) is an appropriate link function. (If (j)j has more than one element, 
each element may be modeled with the same covariates and link function, or this may vary based on the constraints 
on the parameters in (f>j and the biological rationale behind the covariate modeling.) In the exponential example, 
the link function h{.) needs to be such that the parameter Xj is positive. A natural function that ensures this is the 
exponential link function Xj = exp{Z'-LJj}, so that the resulting Xj are all greater than 0. 

The likelihood function in equation 4 can then be rewritten as follows: 


L(W,X,Z,r,|u;,/3,a) = nn 


2=1 i=o 



'Hij 




A similar Gibbs sampler routine to that presented in 3.6 is used for estimation. However, instead of sampling 0, each 
element oi /3j,j = 1, is sampled from the posterior 

n(/3,fe (expit (x'/3,)), fc = 1 ,...,p„ 

2 = 1 l^jk 

with I3jk cr^.J, fc = 1,. . . ,pj. 

Similar methods can be used for the estimation of j = replacing the steps for sampling (j)j and the 

prior parameters with steps for sampling each element of ujj for all LUj and the associated prior parameters. 


4 Simulation Studies 

To determine the efficiency, accuracy and consistency of our method and algorithm in the SEER-Medicare data 
context, we conducted a simulation study for the multivariate screening case. We set the maximum number of 
possible lifetime screenings at two (£ = 2), which is the maximum number of observed colonoscopy screenings in the 
SEER-Medicare data set, and is a value consistent with medical practice in the oldest old. Data were generated 
varying the percentages of 0, 1, or 2 lifetime screenings, and assuming different lag times for subjects with only one 
screening when compared to subjects with two screenings. As is suggested by the SEER-Medicare data (see Figure 
3), we assumed an exponential distribution for the lag times. Under the exponential distribution for the lag times, 
the multivariate survival distribution for subjects with two screenings becomes: 

P{Ti > ti,T 2 > t 2 \ A 2 , a, M = 2) = exp{—(A 2 iti + A 22 t 2 )“}, (8) 
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Scenario 

Only 1 Screening 

Two Screenings 

Lag Time (An) 

1®* Lag time (A 21 ) 

2 "“ Lag time (A 22 ) 

LTl 

LT2 

LT3 

4.3 (0.02) 

3.5 (0.09) 

2.25 (0.35) 

1 (0.70) 

4/3 (0.50) 

4/3 (0.50) 

1 (0.70) 

2/3 (1.05) 

2/3 (1.05) 


Table 1: Three different lag time (“LT”) scenarios used to generate datasets used in our simulation study. The lag times are shown 
are the median lag times (in years) from the survival distribution that is used to generate the lag time to colonoscopy screenings for 
individuals who are screened at least once in their lifetime. 




Scenario 

Probability of 

0 Screenings (0o) 

Probability of 

1 Screening ( 6 * 1 ) 

Probability of 

2 Screenings ( 02 ) 

NLSl 

NLS2 

1/3 

0.5 

1/3 

0.25 

1/3 

0.25 


Table 2: Two different scenarios for the number of lifetime screenings (“NLS”) used to generate datasets the number of lifetime 
screenings (0, 1, or 2) for each subject in the simulated data sets. 


where A 2 = (A 2 i,A 22 ), and \ 21 t 1 is the cumulative hazard for the lag time to the first of two screenings, and \ 22 t 2 
is the cumulative hazard for the lag time to the second of two screenings. For subjects who only receive one lifetime 
screening, the survival function reduces to the standard survival function for the exponential distribution, and is 
given by 

P[T > t I Ai, M = 1) = exp{—Alt}. 

To prevent identifiability issues, we assumed the maximum possible lag time was ten years (as subjects who are 
overdue by more than ten years are no longer “average-risk” due to the rate of colorectal cancer progression [39]), 
and that subjects were only eligible for colonoscopy screenings between the ages of 50 and 90 years old. It is rare 
that a colonoscopy screening would be recommended for a patient over 90 because the risks associated with the 
colonoscopy screening procedure outweigh the long-term benefits of colonoscopy [30]. The length of the simulated 
study was 25 years. Three different lag time scenarios were used to generate data, and are denoted as ‘LTl’, ‘LT2’, 
and ‘LT3’, (see Table 1). The three different lag time scenarios were paired with two different scenarios for the 
number of screenings (denoted as ‘NLSl’ and ‘NLS2’), and can be seen in Table 2. The correlation parameter a was 
set at 0.9 (light correlation between screenings), as was evidence by the SEER-Medicare data. The parameter values 
used to simulate data were chosen based on observed lag times in the SEER-Medicare data set and the possible true 
number of lifetime screenings. Paired together, there were six different types of simulated data sets, each containing 
1000 subjects and generated 200 times. Left- and right-censoring percentages were approximately 50% and 40%, 
respectively, in the simulated data sets. About 40% of subjects had at least one observed screening, and about 15% 
of subjects had two observed screenings. 

Markov Monte Carlo (MCMC) chains were run on each data set, with the first 10,000 iterations burned for a 
total of 40,000 thinned iterations in each chain for analysis. Point estimates were calculated as the median of the 
marginal posterior distribution of each parameter. 
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4.1 Simulation Results 


Performance of the algorithm was assessed through the bias and the square root of the mean square error (RMSE). 
The bias was calculated as the average difference between the parameter estimate and the true value of the parameter, 
and the RMSE was calculated as the square root of the average squared difference between the parameter estimate 
and the true value of the parameter. 

Across all data set variations, the average bias (RMSE) of the median lag times, 111,121 and I 22 , were 0.03 
(0.41), 0.20 (0.70), and 0.91 (1.89) years for the only lifetime screening, the first of two screenings, and the second 
of two screenings. Over a possible 10 year lag time, this represents an RMSE of 4.1%, 7%, and 18.9% of the total 
possible range. The bias and RMSE of lii, the median of the lag time to the only lifetime screening, change little 
across the different LT and NTS models. The median lag time with the largest RMSE is the second of two lag times, 
as it has the fewest observations available to contribute to parameter estimation, with the largest values occurring 
in the data sets with the longest lag times and lower probability of two observed screenings. However, the overall 
RMSE is small, with less than a 1-year bias (over a 10-year range) across all data sets. 

The bias (RMSE) for the percentage of screenings is small, with values equal to —9.67 x 10“"^ (0.02), -0.01 (0.02), 
and 0.01 (0.03) for the probability of no lifetime screenings (Oq), the probability of one lifetime screening (0i), and 
the probability of two lifetime screenings ( 6 * 2 ), respectively. While 82 has the largest RMSE, the overall quantity 
(3%) is very small. 

Table 3 presents the bias (RMSE) of the parameter estimates across all simulated data sets. This method precisely 
estimates the screening rates and the lag times, although larger bias and RMSE values are seen with the percentage 
of subjects with two screenings, and in the second lag time. This is expected, as these parameters have the fewest 
screenings contributing observed data. 


5 Application to SEER-Medicare Data 

We applied our multivariate survival model to the SEER-Medicare data set to investigate colonoscopy screening 
behavior between 1991 and 2003, assuming the maximum number of lifetime screenings was equal to 2 (i.e. £ = 2), 
as that is the maximum number we observed in our data set. This data set contains 403,842 individuals age 65 
or older at study entry after the removal of subjects who used other CRC screening methods (such as fecal occult 
blood tests or sigmoidoscopy). Individuals were considered eligible for screening colonoscopy in 1991; while current 
screening guidelines recommend screening starting at age 50, very few people received colonoscopies before the early 
1990’s (as the USPSTF did not even provide official guidelines until 1996 [30]), and therefore the probability of an 
unobserved screening on an average-risk individual occurring before 1991 is very small. Among these subjects, 62% 
were left-censored, with average left-censoring times equal to 4.5 years (range: 0.1 - 9.9 years) . In addition, 22% 
had one observed colonoscopy, and 0.11% had two observed colonoscopies. Among individuals who had at least one 
colonoscopy, the median lag time before the first observed screening was 5.7 years (range: 0.01 to 10 years), and 
among individuals with two colonoscopies, the median lag time before the second observed screening was 0.1 years 
(range: 0.01 to 2.7 years). An approximated hazard rate showed the rate of screening was constant (see Figure 3), 
so we assumed an exponential distribution for all lag times. 

We examined both univariate models (estimating the time to the first screening), including a covariate to account 
for insurance coverage levels, as well as multivariate models estimating parameters for zero to two screenings. The 
univariate models were created to provide initial estimates of the median time to the first screening (regardless of 
whether it was the only screening or the first of two), as well as the probability an individual would never be screened. 
In addition, we were able to include an insurance level coverage covariate in the univariate model, which allowed 
us to quantify the effects of at least some insurance coverage on the probability of receiving at least one lifetime 
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Model 


LTl 

LT2 

LT3 

Total ( 

over 9) 


Bias 

RMSE 

Bias 

RMSE 

Bias 

RMSE 

Bias 

RMSE 


Oo 

0.00 

0.02 

0.00 

0.02 

0.00 

0.02 

0.00 

0.02 


Oi 

-0.01 

0.03 

0.00 

0.02 

-0.01 

0.02 

-0.01 

0.02 


O2 

0.02 

0.03 

0.00 

0.02 

0.01 

0.02 

0.01 

0.03 

NLSl 

Vn 

-0.02 

0.59 

-0.02 

0.28 

0.00 

0.25 

-0.02 

0.42 


V21 

0.20 

0.77 

0.27 

0.71 

0.34 

0.99 

0.28 

0.84 


V22 

1.02 

2.00 

0.32 

0.90 

0.54 

1.43 

0.66 

1.55 


a 

0.03 

0.11 

0.00 

0.10 

0.02 

0.09 

0.01 

0.10 


00 

-0.01 

0.02 

0.00 

0.02 

0.00 

0.02 

-0.01 

0.02 


01 

-0.01 

0.03 

0.00 

0.02 

-0.01 

0.02 

-0.01 

0.02 


02 

0.03 

0.04 

0.00 

0.02 

0.01 

0.03 

0.01 

0.03 

NLS2 

Vii 

0.14 

0.50 

0.05 

0.38 

0.05 

0.23 

0.07 

0.40 


V21 

0.05 

0.11 

0.18 

0.69 

0.09 

0.45 

0.12 

0.52 


Y22 

1.93 

2.78 

0.35 

1.17 

1.14 

2.20 

1.16 

2.17 


a 

0.06 

0.09 

0.02 

0.07 

0.04 

0.07 

0.04 

0.08 


00 

-0.01 

0.02 

0.00 

0.02 

0.00 

0.02 

0.00 

0.02 


01 

-0.01 

0.03 

0.00 

0.02 

-0.01 

0.02 

-0.01 

0.02 


02 

0.02 

0.03 

0.00 

0.02 

0.01 

0.03 

0.01 

0.03 

Total (over the lag times) 

Vn 

0.04 

0.57 

0.02 

0.34 

0.03 

0.24 

0.03 

0.41 


V21 

0.13 

0.57 

0.23 

0.72 

0.23 

0.79 

0.20 

0.70 


V22 

1.51 

2.45 

0.37 

1.09 

0.86 

1.88 

0.91 

1.89 


a 

0.04 

0.10 

0.01 

0.09 

0.03 

0.08 

0.03 

0.09 


Table 3: Summary bias and RMSE results for the simulation studies across all data sets. Results of the simulations show the estimates 
of the screening rates to be close to the true parameter values, with RMSE values less than or equal to 3% for all possible values of 
and ranging from 3 months to 2.5 years for the median lag times, denoted as Y (which is equal to 2.5% to 20% of the possible lag time 
range). The parameter with the highest variation and bias is the second of two median lag time, Y 22 (as a function of A 22 ), which is 
expected as the percentage of subjects with two observed screenings is small. 
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Figure 3: Approximated hazard rate of the time to the first screening between years 1991 and 1996 (before Medicare insurance coverage 
changes or guidelines were set). The hazard rate is very flat, providing evidence that an exponential parametric distribution is appropriate. 
The hazard rate is approximated dividing the number of observed failures by the number of subjects at risk (provided by survfltO in 
R), and then smoothed using smooth.spline() in R. 


colonoscopy screening. We then examined multivariate models with two possible lifetime screenings. This allowed us 
to quantify differences between the lag time to the first lifetime screening and the lag time to the first of two lifetime 
screenings. We were also able to examine if the lag time to the first screening was longer or shorter than the lag 
time to the second screening. Among both the univariate and multivariate models, we compared the results when 
individuals were eligible for screening until age 75 or eligible for screening until age 80, which are commensurate with 
current screening guidelines [30]. In the multivariate model, we assumed a maximum lag time of ten years to prevent 
issues with identifiability. 

Five separate Markov chain Monte Carlo (MCMC) chains were run for each model, each with a burn-in of 5,000, 
leaving a total of 15,000 thinned iterations in each chain for analysis. Convergence was determined through the 
Geweke diagnostic [11], graphical diagnostics (such as trace plots and density plots), and Gelman-Rubin tests [9, 4]. 
Point estimates were calculated as the median of the posterior marginal distributions for each parameter, and 95% 
central credible intervals were used for inference. 

5.1 Univariate model results 

We first examined univariate models, which provided us with initial estimates on the probability of never receiving 
a colonoscopy screening and the time to the first screening (based on the likelihood function in equation 5). The 
simple univariate model shows that before age 75, approximately 38% (95% Cl: 37.6% - 37.9%) of the Medicare 
population gets a colonoscopy screening, with a median lag time (calculated based on A) equal to about 5.2 years 
(95% Cl: 5.16 - 5.20 years) . These numbers change slightly when the maximum screening age is raised to age 80, 
with slight increases in screening rates as well as increases in median lag times (see Table 4). 

To determine the impact of changes in levels of insurance coverage for colonoscopy screenings (i.e. differences 
between coverage phase 0, phase 1, and phase 2) on colonoscopy screening rates, we included a covariate in the 
estimate of 6 in the following manner: 

0 = expit^/3o + /3i/{study entry after 1998}^. 

In the covariate model, the baseline group (represented by /3o) were subjects who became eligible for screening when 
no colonoscopy coverage was offered, and /?i represents the change in this probability when some or all coverage was 
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Univariate 

Model 

Parameter 

Cap at age 75 

Cap at age 80 

Estimate 95% Cl 

Estimate 95% Cl 

Simple Model 

9 

Median time to screening 

0.622 (0.621, 0.624) 

5.166 (5.135, 5.197) 

0.554 (0.552, 0.556) 

6.533 (6.486, 6.576) 

Covariate Model 

0 , no coverage 

0 , some coverage 
Median time to screening 

0.639 (0.637, 0.641) 

0.508 (0.502, 0.513) 

0.570 (0.568, 0.572) 

0.404 (0.398, 0.411) 

5.416 (5.381, 5.451) 

6.896 (6.844, 6.945) 


Table 4: Univariate model results showing the median and 95% credible intervals (calculated as the 2.5% and 97.5% of the MCMC chain 
for each parameter) of the posterior distributions for the probability of never being screened for colorectal cancer (6), and the median 
lag time (as a function of A) to the first screening for colorectal cancer in the SEER-Medicare data set. 


available. (A covariate was not included in the parameter for the lag time, as there was not enough information to 
reliably run the Gibbs Sampler for this particular data set.) Results from the covariate model show that screening 
rates increased almost 15 percentage points for subjects age 75 and younger when at least some insurance coverage 
was offered, and increased almost 17 percentage points for subjects 80 and younger when at least some insurance 
coverage was offered. These results show that providing at least some insurance coverage for colonoscopy screenings 
dramatically improves the rate of screening (see Table 4). 

Figure 4 shows estimated “survival curves” (i.e. the probability of no lifetime screening colonoscopy) for subjects 
who have no colonoscopy insurance coverage compared to subjects who have some or all colonoscopy insurance 
coverage. In our analysis of colorectal cancer screening, a higher survival curve indicates a worse screening pattern 
(i.e. lower numbers and longer lag times), and it can be observed that (not surprisingly) the subjects who had no 
colonoscopy coverage had lower rates of screening. Among patients eligible for screening up to age 75, 26% of patients 
without coverage were screened by age 60, and 33% of patients without coverage were screened by age 70. However, 
when at least some coverage for colonoscopy was available, 36% of patients were screened by age 60, and 45% of 
patients were screened by age 70. This can also be observed in Figure 5, which shows the densities of the survival 
curves (i.e. probability of no colonoscopy screening) for subjects at age 55, age 60, and age 70. The figure shows that 
at age 55, the two densities are the closest together, and each density is narrow. However, by age 70, the densities 
are farther apart from each other, meaning that differences in screening patterns between subjects with and without 
insurance coverage become bigger with increasing age. Note that in all three graphs, the densities do not overlap, 
providing evidence that the probability of never being screened via colonoscopy is statistically significantly different 
when subjects have some insurance coverage compared to those who have no insurance coverage. 

5.2 Multivariate model results 

In the multivariate case, we examined the case of two maximum possible lifetime screenings, as we had no individuals 
with three or more observed colonoscopies in our data set, and we assumed both lag times were distributed expo¬ 
nentially. No covariates were included in the multivariate model; by the nature of the multiple screenings model, 
the lag times and screening percentages at different points in the study inherently include temporal changes such as 
insurance coverage levels. 

Results show that up to age 75, the probability of never being screened is approximately 68% (95% Cl: 67.6% - 
67.9%). The probability of one lifetime screening is about 27% (95% Cl: 26.8% - 27.0%), and the probability of two 
lifetime screenings is about 5% (95% Cl: 5.3% - 5.4%) . Among subjects who are only screened once, the estimated 
median lag time is 2.5 years (95% Cl: 2.53 - 2.55 years). Among subjects who are screened twice, the median lag time 
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Figure 4: Univariate model results showing estimated survival curves (i.e. the probability of not being screened via colonoscopy) for the 
time to the first screening, comparing subjects with no colonoscopy insurance coverage to those with at least some colonoscopy insurance 
coverage. In the colonoscopy screening context, a high survival curve indicates a poor rate of screening. It can be observed that the 
subjects with the higher survival curve are those without colonoscopy coverage. 
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Figure 5: Univariate model results showing the densities for the probability of not being screened for colorectal cancer via colonoscopy 
by age 55, 60, and 70 years, comparing those with no coverage (solid line) and those with at least some coverage (dashed line). Note 
that while the x-axes on all three graphs cover a different range of probability values, the size of the range is 15% for all three graphs. 
We can observe that the two densities are closest together at 55 years, and are farthest apart at 70 years. Note that in all three figures, 
the densities do not overlap, providing evidence that the probability of screening under some and no coverage is statistically significantly 
different across the different time points. The densities are calculated using the densityO function in R on the MCMC chain of survival 
probabilities, calculated at each iteration of the thinned and burned chains.. Results are shown for the model with a maximum screening 
age of 75 years. 


for the first screening is 1 year (95% Cl: 1.01 -1.06 years), and the median lag time for the second screening is 1.6 years 
(95% Cl: 1.57 - 1.62 years) . The parameter a, which represents the correlation between screenings, is equal to 0.92 
(95% Cl: 0.912 - 0.923) , which means the within-subject correlation between screenings is low. Numbers changed 
little when subjects were eligible for screening up to age 80 (see Table 5 for model results). Note that the probability 
of never being screened through colonoscopy before age 75 is similar in the multivariate and univariate models. Lag 
times between the univariate and multivariate models differ because of the unrestricted possible maximum lag time 
in the univariate models. 

Estimated marginal survival curves (i.e. the probability of not being screened) for the each of the lag times of the 
multivariate model are shown in Figure 6. As with the univariate model, high survival curves indicate a poor screening 
rate. On the left, it is observed that the number of two or more screenings is very low, and differences between the 
first and second of two screenings are minor. The number of individuals receiving one lifetime colonoscopy screening 
is higher, but still poor. Five years after becoming eligible for screening, 26% of subjects have had one screening 
(either the only lifetime screening, or the first of two lifetime screenings). Five years after becoming due for the second 
screening, only 4.7% of subjects will have had a second screening. On the right, the survival curves are again shown, 
but conditional on getting one or two lifetime screenings (i.e. 9 is not used in the calculation of the survival curve). 
These estimates show that among individuals who get two screenings, the time to the first screening is shorter than 
the time to the second screening, with 96% of these subjects getting the first screening within 5 years of becoming 
due, and 88% of subjects getting the second screening within 5 years of becoming due. Subjects who only get one 
lifetime colonoscopy screening waited longer, with 76% of these subjects getting screened within the first five years 
of becoming due. Our results show that while the actual rates of screening are poor, those who are getting screened 
are diligent, with a large majority of individuals getting screened within five years after becoming due. Figure 7 
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Covariate 

Cap at age 75 

Cap at age 80 

Estimate 95% Cl 

Estimate 95% Cl 

Probability of no screenings 
Probability of one screening 
Probability of two screenings 

0.677 (0.676, 0.679) 

0.269 (0.268, 0.270) 

0.054 (0.053, 0.054) 

0.686 (0.685, 0.688) 

0.224 (0.222, 0.225) 

0.090 (0.089, 0.091) 

Median time to only 1 screening 
Median time to 1®* of two screenings 
Median time to 2”'* of two screenings 

2.536 (2.526, 2.547) 

1.037 (1.013, 1.064) 

1.597 (1.572, 1.620) 

2.687 (2.676, 2.698) 

1.270 (1.253, 1.289) 

1.564 (1.540, 1.585) 

Correlation parameter a 

0.917 (0.912, 0.923) 

0.961 (0.956, 0.964) 


Table 5: Median estimates and 95% credible intervals (calculated as the 2.5% and 97.5% of the MCMC chain for each parameter) for 
the probability of receiving none, one, or two screenings in a lifetime, and the median time to the only screening or the first and second of 
two screenings for colorectal cancer in the SEER-Medicare data set, as well as the parameter a, which represents the correlation between 
screenings. Results are similar regardless of the maximum eligible age for screening. 


shows the bivariate survival distribution in a contour plot. While the figure is fairly symmetric (meaning there is 
little difference between the lag time to the first screening and second screening), the grey shading extends slightly 
higher up the y-axis (the axis that denotes the time to the second screening), which means that the time to the 
second screening is delayed slightly longer when compared to the first screening. The bivariate survival distribution 
is only shown for the first five years, as the probabilities for years five through ten are very small and it is difficult 
to discern differences in the distribution after this time point. 


6 Discussion 

We have proposed a cure rate model for multivariate survival data that can account for both left- and right-censored 
data. We have demonstrated theory that works for the general case of multiple lifetime screenings, and then applied 
it to the case of two colonoscopy screenings. The case of two colonoscopy screenings in a lifetime is common, as 
beyond a certain age, the risks outweigh the long-term benefits of screenings, and are often not recommended in the 
later stages of life. This model provides robust estimates, even in the difficult setting of considerable left- and right- 
censoring, and with the inclusion of subjects who never get screened. Our approach provides estimates sufficiently 
accurate to detect both demographic differences and the time-varying impact of policy shifts. 

Using this method, we have shown that many individuals are never being screened for colorectal cancer, with 
overall estimates of at least one screening at only 30%. However, screening behavior was dramatically improved 
following increases in Medicare payments, with an estimated reduction in the probability of never being screened 
for colorectal cancer of around 15% or more when colonoscopy coverage was provided. These results agree with 
previous work, which has shown that screening incidence is generally low, but can be improved with increased levels 
of coverage [40, 28]. In addition, among subjects who do get screened, they are diligent, and do not wait long periods 
of time after becoming due for a screening. We have extended these results to quantify the exact rates of incidence 
and how adherent individuals are to current screening guidelines. 

Future work with this model and the SEER-Medicare data set includes linking lifetime screening behavior to 
cancer incidence rates, as well as the inclusion of other screening modalities, such as sigmoidoscopy and FOBT. This 
link will greatly inform the debate on optimal screening guidelines, as well as improve current cost-benefit analyses 
of CRC screening and Medicare expenditures. 

We have only presented simulations for the case of two lifetime screenings, which is reasonable for analysis of the 
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Figure 6: LEFT: Multivariate model results showing the marginal estimated survival curves (i.e. probability of not being screened for 
colorectal cancer via colonoscopy) for the only lifetime screening (solid line), or the first and second of two screenings (dashed and dotted 
lines). As expected, more people get only one screening in a lifetime rather than two lifetime screenings, and therefore that survival 
curve is lowest. RIGHT: The estimated survival curves conditional on the number of lifetime screenings. These curves show that among 
subjects who will receive two screenings, the first screening happens quickly when compared to the second screening. Subjects who only 
get one colonoscopy take the longer than those who get two colonoscopies. As with the univariate models, a high survival curve indicates 
poor screening rates. Estimates were calculated from the multivariate model that assumes 75 is the maximum eligible age for screening. 
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Figure 1\ A contour plot of the multivariate model results 
showing the joint survival curve (i.e. the probability of not be¬ 
ing screened via colonoscopy), conditional on subjects who get 
two screenings (i.e. the probability of two screenings, 02 ? is not 
used in the calculation of the survival probabilities) for the first 
five years. The figure shows minor differences between the time 
to the first screening and time to the second screening, as the 
contour plot is fairly symmetric. However, the grey shading ex¬ 
tends slightly higher up the y-axis (which represents the time to 
the second screening), meaning that the probability of not be¬ 
ing screened is higher for a longer time period before the second 
screening. As with the univariate models, a high survival curve 
indicates poor screening rates. Years 5 through 10 are omitted 
from the figure as the probabilities are very small difficult to 
discern in the figure. Estimates were calculated from the multi¬ 
variate model that assumes 75 is the maximum eligible age for 
screening. 


SEER-Medicare data set. The extension to 3 or more lifetime screenings is more difficult computationally, although it 
can be done with time and care. Our model has answered very important questions about colorectal cancer screening 
behavior, but also has broad applicability to situations with multiple events where there may be patterns unobserved 
before study entry or after study exit. These types of analyses will become more prevalent as time progresses, 
particularly with major changes in health care coverage due to the Affordable Care Act. Accurate assessment of 
patterns of lifetime preventive medical care will become more necessary as government-funded health care becomes 
more prevalent, and this information is required to determine the effectiveness of different medical procedures. 
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